A Pattern Decomposition (PD) Algorithm for Finding All Frequent Patterns in Large Datasets

نویسندگان

Qinghua Zou

Wesley W. Chu

David B. Johnson

Henry Chiu

چکیده

Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. We propose a pattern decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass making it more efficient to mine frequent patterns in a large dataset. The proposed algorithm avoids the costly process of candidate set generation and saves time by reducing dataset. Our empirical evaluation shows that the algorithm outperforms Apriori by one order of magnitude and is faster than FP-tree. Further, PD is more scalable than both Apriori and FP-tree.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Pattern Decomposition Methods for Finding All Frequent Patterns in Large Datasets

متن کامل

YAFIMA: Yet Another Frequent Itemset Mining Algorithm

Efficient discovery of frequent patterns from large databases is an active research area in data mining with broad applications in industry and deep implications in many areas of data mining. Although many efficient frequent-pattern mining techniques have been developed in the last decade, most of them assume relatively small databases, leaving extremely large but realistic datasets out of reac...

متن کامل

Finding All Frequent Patterns Starting from the Closure

متن کامل

Pattern Decomposition Algorithm for Data Mining Frequent Patterns

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

A Pattern Decomposition (PD) Algorithm for Finding All Frequent Patterns in Large Datasets

نویسندگان

چکیده

منابع مشابه

Using Pattern Decomposition Methods for Finding All Frequent Patterns in Large Datasets

YAFIMA: Yet Another Frequent Itemset Mining Algorithm

Finding All Frequent Patterns Starting from the Closure

Pattern Decomposition Algorithm for Data Mining Frequent Patterns

A New Algorithm for High Average-utility Itemset Mining

عنوان ژورنال:

اشتراک گذاری